DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING **EXAMINATIONS 2005** 

MSc and EEE/ISE PART IV: MEng and ACGI

**Corrected Copy** 

in toward.

#### SYNTHESIS OF DIGITAL ARCHITECTURES

Monday, 16 May 10:00 am

Time allowed: 3:00 hours

There are FIVE questions on this paper.

**Answer THREE questions.** 

All questions carry equal marks

Any special instructions for invigilators and information for candidates are on page 1.

Examiners responsible

First Marker(s):

G.A. Constantinides

Second Marker(s): K. Masselos

This question relates to the code for the function **func**, shown below, where the function exp(x) returns the value  $e^x$ .

```
func( a, b, c )
{
  return (a + 63*b) + exp(c);
}
```

Name of

To implement this behaviour, you have the resource set  $R = \{add, mult, ROM\}$ , with area costs of 1, 2, and 10 units area, and delays of 1, 2, and 1 cycle, respectively. You may also use the definition of the Legendre polynomials over the range [-1,1] given in (Equation 1.1).

$$\phi_i(x) = \frac{1}{2^i i!} \frac{d^i}{dx^i} \{ (x^2 - 1)^i \}$$
 (Equation 1.1)

- a) Draw the CDFG for function func, leaving any function calls un-expanded.
  - Assuming the away I function is implemented as a ROM lookup perform an ASAP
- b) Assuming the **exp(.)** function is implemented as a ROM lookup, perform an ASAP scheduling on your CDFG. State the scheduled start time of each node, and the overall latency.

  [3]
- c) Perform a resource binding on your scheduled CDFG. State the resulting binding function, and calculate the resulting area of the circuit. State any assumptions you make.
- d) Given that the parameter **c** is known to lie in the range [-1, 0], construct a 2nd order uniformly-weighted least-squares polynomial approximation to **exp(c)**,  $g(x) = a_0 + a_1 x + a_2 x^2$ . State the values of  $a_0$ ,  $a_1$ , and  $a_2$ . [5]
- e) Apply strength reduction to the code, and construct the modified CDFG of the strength-reduced code, once the call to **exp(c)** has been replaced by a Horner's scheme evaluation of the appropriate approximation.
- f) Perform an ASAP scheduling and resource binding on the modified CDFG. State the start time of each node, and the resulting minimum area of the circuit.

[3]

[2]

[3]

The notation in Table 2.1 applies to this question.

Table 2.1

| Symbol         | Meaning                                                                         |  |  |  |
|----------------|---------------------------------------------------------------------------------|--|--|--|
| $\overline{V}$ | vertex set of the CDFG                                                          |  |  |  |
| E              | edge set of the CDFG                                                            |  |  |  |
| $\overline{R}$ | resource type set                                                               |  |  |  |
| $\frac{S}{Y}$  | scheduling function                                                             |  |  |  |
| $\overline{Y}$ | binding function                                                                |  |  |  |
| $x_{vtir}$     | binary decision variable with $x_{vtir} = 1$ iff $S(v) = t$ and $Y(v) = (r,i)$  |  |  |  |
| hir            | binary decision variable with $b_{ir} = 1$ iff instance i of resource type r is |  |  |  |
|                | used                                                                            |  |  |  |
| $a_r$          | an upper bound on the number of resources of type $r \in R$                     |  |  |  |
| $ASAP_{v}$     | the ASAP start time of node $v \in V$ when all nodes have minimum latency       |  |  |  |
| $ALAP_{\nu}$   | the ALAP start time of node $v \in V$ when all nodes have minimum latency       |  |  |  |
| T(v)           | the resource type set for node $v \in V$                                        |  |  |  |
| $c_r$          | the cost of resource type $r \in R$                                             |  |  |  |
| $d_r$          | the delay of resource type $r \in R$                                            |  |  |  |
| $d_{\min v}$   | the minimum delay of node $v \in V$                                             |  |  |  |

a) Formulate the minimum-cost latency-constrained combined scheduling, binding, and module selection problem as an ILP in the variables  $\{x_{vtir}\}$  and  $\{b_{ir}\}$ .

[5]

b) Construct the CDFG for the code shown in Figure 2.2.

[1]

c) By substituting appropriate values into your formulation from part (a), show that the ILP for this CDFG, when subject to an overall latency constraint of 2 cycles, simplifies to (Figure 2.2). Ensure you clearly show how each inequality is formed. (You may make the assumptions shown in Table 2.4)

[8]

d) By enumeration of the possible variable values, or otherwise, derive all feasible solutions to the ILP shown in Figure 2.3, and identify the optimum.

[6]

## [Question 2 continues on the next page]

#### [Question 2 continued]

```
p = x + 2;  // operation a
q = x + x;  // operation b
r = p * 2;  // operation c
```

Figure 2.2

```
minimize: 4b_{1,\text{mult}} + 2b_{1,\text{ripple}} + 3b_{1,\text{lookahead}} subject to: x_{a,0,1,\text{lookahead}} = 1 x_{b,0,1,\text{ripple}} + x_{b,0,1,\text{lookahead}} + x_{b,1,1,\text{lookahead}} = 1 x_{c,1,1,\text{mult}} = 1 x_{a,0,1,\text{lookahead}} + x_{b,0,1,\text{lookahead}} \leq b_{1,\text{lookahead}} x_{b,1,1,\text{lookahead}} \leq b_{1,\text{lookahead}} x_{b,0,1,\text{ripple}} \leq b_{1,\text{ripple}} x_{c,1,1,\text{mult}} \leq b_{1,\text{mult}}
```

Figure 2.3

Table 2.4

| Parameter                | Value                                                                                                                           |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------|
| R                        | {mult, ripple, lookahead}                                                                                                       |
| $c_{ m mult}$            | 4                                                                                                                               |
| Cripple                  | 2                                                                                                                               |
| Clookahead               | 3                                                                                                                               |
| $d_{ m mult}$            | 1                                                                                                                               |
| $d_{\text{ripple}}$      | 2                                                                                                                               |
| $d_{\mathrm{lookahead}}$ | 1                                                                                                                               |
| $a_{ m mult}$            | 1                                                                                                                               |
| $a_{\text{ripple}}$      | 1                                                                                                                               |
| $a_{ m lookahead}$       | 1                                                                                                                               |
| T(v)                     | $\{\text{mult}\}\ \text{if } v \text{ is a multiplication}$ $\{\text{ripple, lookahead}\}\ \text{if } v \text{ is an addition}$ |

This question concerns the code shown below. The behaviour is to be implemented using the resource types: multiplier, subtractor. A multiplier takes two cycles to execute, an adder takes one cycle. Both read and write operations take no cycles.

```
Subtractor
func(p,q) {
    if( p > 0 )
        return (q*p)*(1 - p);
    else
        return 0;
}

main( ) {
    read r;
    read x;
    for i = 1 to N {
        a = func(x,r);
        x = 2*a;
    }
    write x;
}
```

3

a) Draw the CDFG for this code, representing read and write calls as read and write nodes, respectively.

[3]

You may take a compositor to be a zero-latercy operation.

b) Perform an ASAP scheduling on this CDFG. For nodes with more than one execution time, express the times in terms of i and/or N, as appropriate.

[3]

c) Construct the resource conflict graphs for the scheduled CDFG, and colour them to obtain a resource binding.

[2]

d) Construct the register conflict graph for this scheduled CDFG, and state whether the graph is an interval graph (you may exclude the variable i from consideration).

[3]

e) Colour the register-conflict graph using 4 colours.

[3]

f) By reference to clique-number, show that the register-conflict graph cannot be coloured using 3 colours.

[3]

g) Draw the complete datapath resulting from your scheduling, resource binding, and register sharing (you may exclude reads and writes).

[3]

4

This question concerns the following code, to be implemented using adder/subtractor resources, which take one cycle each to execute.

p = x + y; // operation a
q = p - 2; // operation b
r = p + 2; // operation c
s = q - y; // operation d
t = q + y; // operation e
u = r - x; // operation f
v = x + x; // operation g

a) Construct the CDFG for this code, labelling all the vertices with their operation letter.

[2]

b) What is the minimum overall latency required to execute this code?

[1]

c) Name an appropriate algorithm to schedule this code under a given upper bound on the number of adder/subtractors to be used.

[1]

d) Schedule the code using the algorithm from part (c), for all positive integer values of the upper bound. Break ties in urgency by choosing the operation in order of the alphabet (e.g. **a** before **b**). State the start time of each operation for all values of the bound.

[3]

e) Name an appropriate algorithm to schedule this code given that it must complete within a given upper bound on the number of cycles.

[1]

t) Schedule the code using the algorithm from part (e), for all integer values of the bound greater than or equal to the minimum overall latency. Break ties in urgency by choosing the operation in order of the alphabet (e.g. **a** before **b**). State the start time of each operation for all values of the bound.

[3]

g) Draw an area/latency design-space for your behaviour, clearly identifying all the feasible schedules you have constructed from parts (d) and (f).

[2]

h) Define the term "inferior design", and identify any inferior designs in your design-space.

[1]

## [Question 4 continues on the next page]

# [Question 4 continued]

| i) Is one of the two algorithms used more likely to produce inferior designs? I  |     |  |
|----------------------------------------------------------------------------------|-----|--|
| explain your answer.                                                             | [3] |  |
| j) Define the term "Pareto-optimal". Identify any Pareto-optimal designs in your |     |  |
| design-space.                                                                    | [1] |  |
| k) Will one of the two algorithms always produce Pareto-optimal designs? Expla   |     |  |
| your answer.                                                                     | [2] |  |

This question concerns the following scheduled and bound code, in which all operations take a single cycle to execute. **x** and **y** are the 8-bit inputs to this circuit, and **t4**, **t5**, and **t6** are the 16-bit outputs. The controller of this circuit consumes a significant proportion of the overall area.

```
t1 = x + y;  // cycle 0, resource (+,1)
t3 = x * y;  // cycle 0, resource (*,1)
t2 = t1 * t3;  // cycle 1, resource (*,1)
t4 = t1 + t3;  // cycle 1, resource (+,1)
t5 = t2 / t4;  // cycle 2, resource (/,1)
t6 = w * z;  // cycle 2, resource (*,1)
```

5

A register sharing is performed such that register **p** is used to store both **t1** and **t2**, whereas register **q** is used to store **t3**, **t4**, and **t5**. Result **t6** has a dedicated register.

a) Define the term "resource dominated", and state whether this circuit is resource dominated.

[2]

b) Figure 5.1 shows a partially completed ROM table for a horizontal microcode implementation. State the value of N, the final address, and complete the table in binary.

[5]

| Address | Select<br>lines at<br>input to<br>(+,1) | Select<br>lines at<br>input to<br>(*,1) | Select lines at input to register <b>p</b> | Select<br>lines at<br>input to<br>register <b>q</b> | Enable line for register <b>p</b> | Enable line for register | Enable line for register <b>t6</b> |
|---------|-----------------------------------------|-----------------------------------------|--------------------------------------------|-----------------------------------------------------|-----------------------------------|--------------------------|------------------------------------|
| 0       |                                         |                                         |                                            |                                                     |                                   | <del> </del>             | <u> </u>                           |
|         |                                         |                                         |                                            |                                                     |                                   |                          |                                    |
| N       |                                         |                                         |                                            |                                                     |                                   | <u> </u>                 | <u> </u>                           |

Figure 5.1

c) Remove any duplicate control lines (those for which there is another control line with which they have equal value at all time instants) and superfluous control lines (those that have the same value for all time), to obtain a more compact horizontal microcode representation. Explain the steps in your answer.

[3]

d) Re-design your controller using the principle of distributed control, stating the contents of each ROM. You may use horizontal microcode. State clearly any reasons behind design choices you have made.

[7]

e) State Rent's rule. Given that the circuit has gate-level Rent constants  $\beta = 0.5$ , K = 2.0, estimate the number of gates in the circuit.

[3]

E443 Jsc 443 Auj

( a)



Overly latery = 4 years

Assuming resource-dominated, area = 1+2+10 = 13.

d) Need to get it range (-1,1)
$$c \in [-1,0] = 2c+1 \in [-1,1]$$
Let  $x = 2c+1$ 
We with express  $f(x) = \exp\left(\frac{1}{2}(x)\right)$ 

We with expres 
$$f(x) = \exp\left(\frac{1}{2}(x-1)\right)$$
 as weighted sum of legale polynomids.

$$\phi_o(x) = 1$$

$$\psi_{i}(x) = \frac{1}{2} \frac{d}{dx} \left\{ x^{2} - i \right\}$$

$$= xi$$

$$\phi_{2}(x) = \frac{1}{(4)(2)} \frac{d^{2}}{dx^{2}} \left\{ (2^{2} - 1)^{2} \right\}$$

$$= \frac{1}{8} \frac{d^{2}}{dx^{2}} \left\{ x^{4} - 2x^{2} + 1 \right\}$$

$$= \frac{1}{8} \frac{d}{dx} \left\{ 4x^{3} - 4x^{2} \right\}$$

$$= \frac{1}{2} \left( 3x^{2} - 1 \right)$$

$$\langle \phi_c, \phi_c \rangle = \int_{-1}^{1} (dx = 2)$$

$$\langle \phi, \phi, \rangle = \int x^2 dx = \left[\frac{1}{3}x^3\right] = \frac{2}{3}$$

$$\langle \phi_{2}, \phi_{2} \rangle = \frac{1}{4} \int_{-1}^{1} (3x^{2} - 1)^{2} dx$$

$$= \frac{1}{4} \int_{-1}^{1} (4x^{4} - 6x^{2} + 1) dx = \frac{1}{4} \int_{-1}^{1} \frac{1}{5}x^{5} - 2x^{3} + 2 \int_{-1}^{1}$$

$$= \frac{1}{2} \left[ \frac{4}{5} - 2 + 1 \right]$$

$$= \frac{215}{2} \left[ e^{\frac{1}{2}x} \right]_{-1}^{1} = 2e^{-\frac{1}{2}} \left( e^{\frac{1}{2}} - e^{-\frac{1}{2}x} \right)$$

$$= 2e^{-\frac{1}{2}} \left[ e^{\frac{1}{2}x} \right]_{-1}^{1} = 2e^{-\frac{1}{2}} \left( e^{\frac{1}{2}} - e^{-\frac{1}{2}x} \right)$$

$$= 2 \left( 1 - e^{-1} \right)$$

$$= 2 \left( 1 - e^{-1} \right)$$

$$= 2 \left( 3e^{-1} - 1 \right)$$

$$= 2 + 2e^{-0} - 4 + 4e^{-1}$$

$$= 2 \left( 3e^{-1} - 1 \right)$$

$$= 2 + 2e^{-0} - 4 + 4e^{-1}$$

$$= 2 \left( 3e^{-1} - 1 \right)$$

$$= 2 + 2e^{-1} \left( 3x^{2} - 1 \right) e^{\frac{1}{2}(x - 1)} dx = \frac{3}{2} \int_{-1}^{3x^{2}} e^{\frac{1}{2}(x - 1)} dx$$

$$= \frac{3}{4} \int_{-1}^{3x^{2}} e^{\frac{1}{2}(x - 1)} dx$$

$$\begin{array}{lll}
(1, \psi_{2}) &= & 3 & 2x^{2}e^{\frac{1}{2}(x-1)} \\
&= & 3 & 4xe^{\frac{1}{2}(x-1)} \\
&= & 3 & 4xe^{-1} \\
&= &$$

$$g(x) = a_0 + x(a_1 + a_2 x)$$



Clearly no # Couplists
Also no + couplists

Area = 1+2 = 3, under the assurption of resource domination.

$$\forall v \in V$$
,  $\sum_{r \in T(v)} \frac{a_r}{i=1} \frac{A L A R_v - d_r + d_{min} v}{\sum_{r \in T(v)} \frac{a_r}{i=1} \frac{a_r}{\sum_{r \in A S A R_v} \frac{a_r}{\sum_{r \in T(v)} \frac{a_r}{i=1} \frac{a_r}{\sum_{r \in A S A R_v} \frac{a_r}{\sum_{r \in T(v)} \frac{a_r}{i=1} \frac{a_r}{i=1$ 

$$\frac{d(v',v) \in \mathcal{E}}{2}$$

$$\frac{dv}{2} = \frac{dv}{2} + \frac{dv}{2} + \frac{dv}{2} = \frac{dv}{2} = \frac{dv}{2} + \frac{dv}{2} = \frac{dv}{2}$$



Contraint set (1):

$$\forall v \in \{a,b,c\}$$
  $\sum_{i=1}^{a_r} \frac{AvAP_v - dr + drinv}{t = AsAP_v} = 1$ 

Need ASAP & ALAP:

| V | AS AP(U) | ACAP(0) |
|---|----------|---------|
| ٤ | 0        | D       |
| Ь |          |         |
| Č |          | 1       |

ie 
$$X_{\alpha,0,1}$$
, both about = 1 (A)

(b) 
$$\sum_{r \in \{r, p, p, l\}} \frac{1 - dr + 1}{\sum_{r \in \{r, p, p, l\}}} \sum_{t=0}^{|r-dr|+1} \frac{1}{t}$$

ie. 
$$x_{b,0,1}$$
, bothaled +  $x_{b,1,1}$ , bothaled +  $x_{b,0,1}$ , while = 1 (4)

(c) 
$$= \frac{1}{\sum x_{c,t,1,mult}} = 1$$

i.e. 
$$X_{c,1,1,mld} = 1$$
 (B)

Ad (2) keens

Ht € {0, 1}, Hr € {nyple, bothshead, will }, &

Z Xyt,1,r Sbi,r VE {a,b,c}: reT(v) t'e(t,...,t+dr-13n iAAA,..,ALARV-dr+domv}

(t=0, r=riphe) pains have non-ength time intersections Σ Σ Φν, t', 1, riple νε {a, b3 t ε {0, 13 η ξASAPu}, ..., ALAPU-13 5 biripple (c) => xb,0,1,ripple 5 bi, ripple (t = 0, r = lookahend) Z 2v,b,1, lockahen 5 b, bothabend vie {a,b} (0) Xa, c, 1; lookabeed + Xb, c, 1, lookabeed < b, lookabeed (t=1, r= lookalend) VE {a,b} t'E {t3 \ {ASAPU, ..., ALARI} 5 b, lookahead (€) => xb,1,1, bother < b, bothbal (t=1, r=mult) (F) x c, 1, 1, mile & b., mult

We have  $\forall a, 0, 1, both dead = 1$ =>  $b_1, both dead = 1$  (0)

& 
$$x_{c,1,1}$$
, but = 1  
=>  $b_{1,null}$  = 1 (F)

worth envients are (V1) (V2) (V3) (V4) b1, ripple; 26,0,1, ripple; 26,0,1, balled; 261,1, Cadalad Vorable 1 FEADIBLE? COBJECTIONS CONSTRAZIOT 7+2b, riggle  $\sqrt{I}$ 13 14 V2 ED & E  $\bigcirc$ 0  $\bigcirc$  $\odot$ X 7  $\bigcirc$ 0 - $\bigcirc$ X  $\mathcal{O}$ #V 0 Χ 0 1 0 Χ 0 ( X 0  $\bigcirc$ χ X χ  $\bigcirc$  $\bigcirc$ X X χ 0 χ 0 X Y X OX 0 0  $\circ$ 0 X ( 0  $\bigcirc$  $\bigcirc$ X  $\chi$ 0 Χ χ χ 0

Contibo A, B, E, F are always satisfied.

Optimum selve is with  $b_1$ , ripple=0,  $a_b$ , 0, 1, ripple=0,  $x_b$ , 0, below  $a_b$ , 1, 1, belowed=1. Optimum objetie=7.

Sultander

Claur=0

. HPGJ

d) Given prog the label to be 1-p the label to, be we have milled

t, ; t2; r; x; a

| Vorube<br>t, | (yeles) Produced<br>6(i-1)+1<br>6(i-1) | Cylos Consumed<br>6(i-1)+2<br>6(i-1)+2 |
|--------------|----------------------------------------|----------------------------------------|
| r            | 0                                      | 6(i-1)                                 |
| a            | 0                                      | 6(i-1)                                 |
| a            | 6(i-1)+3                               | 6(i-1)+4                               |

Regular Naded Kgetare [6(i-1)+1,6(i-1)+2] [6(i-1)+1,6(i-1)+2] [a,6(N-1)] [1,6(N-1)] [6(i+1)+4,6(i-1)+4]



It so an intered graph, e.g.

to, 10]

x \lefta \[ [0, 10] \]

t, \lefta \[ [0, 1] \]

 $t_1 \leftrightarrow L_0, 1$   $t_2 \leftrightarrow L_0, 1$   $a \leftrightarrow L_2, 3$ 





- b) 3 cycles (a, b, +)
- c) Resource contrained (ist scheduling
- d) Call bound B.
  Fortherty
  First, calculate (win) ALAP times of ungerey:

|   | S(v) |
|---|------|
| a | D    |
| 6 | Ì    |
| C | 2    |
| L | 2    |
| و | 2    |
| f | 2    |
| 7 | 2    |

For 
$$B=1$$

$$S(a) = 0$$

$$S(b) = 1$$

$$S(c) = 2 \quad \text{for } |a| |a|, f |a| |a|$$

$$S(d) = 3$$

$$S(e) = 4$$

$$S(f) = 5$$

$$S(g) = 6$$

for 
$$B = 2$$
  
 $S(a) = 0$   
 $S(g) = 0$   
 $S(b) = 1$   
 $S(c) = 1$  (by ld, br) e/V  
 $S(d) = 2$   
 $S(e) = 2$   
 $S(f) = 3$ 

For 
$$B = 3$$
  
 $S(a) = 0$   
 $S(a) = 0$   
 $S(b) = 1$   
 $S(c) = 1$   
 $S(d) = 1$   
 $S(e) = 2$   
 $S(f) = 2$   
 $S(g) = 2$ 

$$f(x) = 4$$
  
 $f(x) = 0$   
 $f(x)$ 

For B74, result will be some as for B=4.

$$\lambda = 3$$

$$S(\alpha) = 0$$

$$S(b) = 1$$

$$S(c) = S(d) = S(e) = S(f) = S(g) = 2$$

$$\lambda = 4$$

$$S(a) = 0$$

$$S(b) = 1$$

$$s(c) = 2$$

$$S(d) = S(e) = S(f) = S(g) = 3$$

$$\lambda = S$$

$$y(a) = 0$$

$$s(a) = 3$$

$$S(e) = S(f) = S(g) = 4$$

$$\lambda = 6$$

$$S(x) = S(y) = 5$$

$$\lambda = 7$$

$$\mathcal{L}(c) = 2$$

$$S(d) = 3$$

$$S(g) = 6$$

For 1>7, result

will be as por

$$\lambda = 7$$



- n) An exprise design is one that is worse than other designs in it least one dijective, and no better in the remaining effectives.
- Yes LC-list-scheduling because it puts -074 allocation of resources cutil it is pred to be ALAP times, result; in more resources in the later time slots.
  - inproved in one objective without societing another objective.
  - (1) No. Both algorithms are poly-time, but scheduling is

- 5 a) A recover dominted circuit is one whose computational resources dominted the calculation of onea, power, speed, etc. This is not recovere dominated.
  - b) N=2

() ling labelled (1-4) above can be made idetical. (A) lies labelled (5-7) ove idetical. (B) line (8) is superplusars

a) One possible iden is to group by justical

Group 1

Group Z

e) 
$$N = KG^{\beta}$$
  
where  $N = N_0$  pris  
 $G = N_0$  gates  
 $K, \beta$ : Red contacts

This winit has 16 input bits & 48 output bits N = 16 + 48 = 64

$$64 = 29^{1/2}$$

$$= 9 = 32^{2}$$

$$9 = 1024$$

So ~1024 Gutes